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The practical construction of scalable quantum computer hardware capable of executing non-trivial quan- 
tum algorithms will require the juxtaposition of different types of quantum systems. We propose a modular 
quantum computer architecture with a hierarchy of interactions that can scale to very large numbers of qubits. 
Local entangling quantum gates between qubit memories within a single register are accomplished using natural 
interactions between the qubits, and entanglement between separate registers is completed via a probabilistic 
photonic interface between qubits in different registers, even over large distances. This architecture compares to 
the "multicore" classical information processor, and is suitable for the implementation of complex quantum cir- 
cuits utilizing the flexible connectivity provided by a reconfigurable photonic interconnect network. We further 
show that this architecture can be made fault-tolerant, a prerequisite for scalability. All of the rudiments of this 
architecture have been demonstrated in small-scale trapped ion systems, and we speculate on the technological 
hurdles ahead in order to realize such a system. 



PACS numbers: 03.67.-a, 42.50.Ex, 89.20.Ff 
Introduction 

A quantum computer is composed of at least two quantum 
systems that serve critical functions: a reliable quantum mem- 
ory for hosting and manipulating coherent quantum superpo- 
sitions, and a quantum bus for the conveyance of quantum 
information between memories. Quantum memories are typ- 
ically formed out of matter such as individual atoms, spins 
localized at quantum dots or impurities in solids, or super- 
conducting junctions Q. On the other hand, the quantum 
bus typically involves propagating quantum degrees of free- 
dom such as electromagnetic fields (photons) or lattice vibra- 
tions (phonons). A suitable and controllable interaction be- 
tween the memory and the bus is necessary to efficiently ex- 
ecute a prescribed quantum algorithm via propagation of en- 
tanglement. The current challenge in any quantum computer 
architecture is to scale the system to very large sizes, where 
errors are typically caused by speed limitations and decoher- 
ence of the quantum bus or its interaction with the memory. 
The most advanced quantum bit (qubit) networks have thus 
been established only in very small systems, such as indi- 
vidual atomic ions bussed by the local Coulomb interaction 
l2l or superconducting Josephson junctions coupled capaci- 
tively or through microwave striplines Oil). In this paper, we 
propose a hierarchy of quantum bus levels in a new modular 
quantum computer architecture that may allow the scaling of 
high performance quantum memories to useful sizes. Unlike 
previous related proposals EHS), we show this architecture is 
fault-tolerant, reconfigurable, and based on technology that is 
currently available. 

We specialize to the use of atomic ion qubit memories, al- 



though the general architecture presented here can also be 
adapted to other optically active quantum systems such as 
quantum dots, neutral atoms, or NV-diamond Q. Qubits 
stored in ions enjoy a level of coherence that is unmatched in 
any other physical system, underlying the reason such states 
are also used as high performance atomic clocks. More- 
over, atomic ions can be initialized and detected with nearly 
perfect accuracy using conventional optical pumping and 
state-dependent fluorescence techniques. There have been 
many successful demonstrations of controlled entanglement 
of several-ion quantum registers in the past decade involv- 
ing the use of qubit state-dependent forces supplied by laser 
beams l2l fT0l . These experiments exploit the collective mo- 
tion of a small number of trapped ion qubits, but with more 
than 10 — 100 ions, such operations are more susceptible to 
external noise, decoherence, or speed limitations. 

One promising approach to scaling trapped ion qubits is 
the quantum charge-coupled device (QCCD), which involves 
the sequential entanglement of small numbers of ions through 
their collective motion, and the classical shuttling of individ- 
ual ions between different trapping zones to propagate the en- 
tanglement (HI [El. This approach involves advanced ion 
trap structures, perhaps with many times more discrete elec- 
trodes as trapped ion qubits, and therefore motivates the use 
of micrometer-scale surface traps fT3llT5l and novel fabrica- 
tion techniques [16-1 8 1 . The shuttling solution also requires 
exquisite control of the atomic ion positions during shuttling, 
may require multiple atomic species to act as "refrigerator" 
ions to quench the excess motion from shuttling operations 
fT9lL will likely involve methods to mitigate the effect of ion 
heating from the nearby electrodes |20-22], and cannot eas- 
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Figure 1 : Hierarchical modular quantum computer architecture host- 
ing N — NELuN q qubits. (a) The elementary logic units (ELU) 
consists of a register of N q trapped atomic ion qubits, whereby entan- 
gling quantum logic gates are mediated through the local Coulomb 
interaction between qubits. (b) One or more atomic qubits within 
each of the Nelu registers are coupled to photonic quantum chan- 
nels, and through a reconfigurable optical multiplexing switch (cen- 
ter), fiber beamsplitters and position sensitive imager (right), qubits 
between different registers can be entangled. 



ily be extended over large distances for quantum communi- 
cations applications. The QCCD approach will push current 
state-of-the-art quantum information processing experiments 
to territories where elementary quantum error correction and 
simple quantum algorithms can be implemented, but further 
scaling may be challenging due to the complexity of intercon- 
nects, diffraction of optical beams, and the extensive hardware 
required for qubit control. 

Here we propose a modular universal scalable ion trap 
quantum computer (MUSIQC) architecture that may enable 
construction of quantum processors with up to 10 6 qubits uti- 
lizing component technologies that have already been demon- 
strated. This architecture features two elements: stable 
trapped ion multi-qubit registers that can further be connected 
with ion shuttling, and scalable photonic interconnects that 
can connect these registers in a flexible configuration over 
large distances. We articulate substantial architectural ad- 
vantages in this approach that allows significant speedup and 
resource reductions of quantum circuit execution over other 
hardware architectures, enabled by the ability to operate quan- 
tum gates between qubits throughout the entire processor re- 
gardless of their relative location. Finally, we prove how such 
a quantum network can support fault-tolerant error correc- 
tion even in the face of probabilistic interconnects, and dis- 
cuss the technological developments necessary for its realiza- 
tion. While we focus our discussions on quantum registers 
composed of trapped atomic ions, this architecture can be ex- 
tended to other quantum platforms with strong optical transi- 
tions. 



The Modular Elementary Logic Unit (ELU) 

The base unit of MUSIQC is a collection of N q qubit mem- 
ories with local interactions, called the Elementary Logic Unit 
(ELU). Quantum logic operations within the ELU are ideally 
fast and deterministic, with error rates sufficiently small that 
fault-tolerant error correction within an ELU is possible l23l . 
Here, we represent the ELU with a crystal of N q ^> 1 trapped 
atomic ions as shown in Fig. |2| with each qubit comprised of 
internal energy levels of each ion, labeled as |t)and ||), sepa- 
rated by frequency ujq . We assume the qubit levels are coupled 
through an atomic dipole operator fi = /i(|t) (i\ + \i) (t|)- 
The ions interact through their external collective modes of 
quantum harmonic motion. Such phonons can be used to 
mediate entangling gates through application of qubit- state- 
dependent optical or microwave dipole forces 1 24-26]. There 
are many known protocols for phonon-based gates between 
ions, and here we summarize the main points relevant to the 
size of the ELU and the larger architecture. 

An externally applied near-resonant running wave field 
with amplitude E(x) = E§e lkx and wavenumber k cou- 
ples to the atomic dipole through the interaction Hamiltonian 
H = —fiE(x), and by suitably tuning the field near sidebands 
induced by the harmonic motion of the ions fT2l a qubit state 
dependent force results. In this way, qubits can be mapped 
onto phonon states [Q21 |24) and then onto other qubits for 
entangling operations with characteristic speed R ga te = 
where r] = ^hk 2 / (2moN q uj) is the Lamb Dicke parameter, 
mo is the mass of each ion, uj the frequency of harmonic os- 
cillation of the collective phonon mode, and Q = jj J E /2H is 
the Rabi frequency of the atomic dipole independent of mo- 
tion. For optical Raman transitions between qubit states (e.g., 
atomic hyperfine ground states) fT2l . two fields are each de- 
tuned by A from an excited state of linewidth 7 <C A, and 
when their difference frequency is near resonant with the qubit 
frequency splitting ujq, we use instead Q = (j^Eq) 2 / (2h 2 A). 

The typical gate speed within an ELU therefore slows down 

— 1/2 

with the number of qubits N q as R ~ N q . For large crys- 
tals, there will be crosstalk between the many modes of collec- 
tive motion. However, through the use of pulse- shaping tech- 
niques l27l . the crosstalk errors need not be debilitating, al- 
though the effective speed of a gate will again slow down with 
size N q . Background errors such as the decoherence (heat- 
ing) of the motional modes [ 20 1 or fluctuating fields that add 
random phases to the qubits will become important at longer 
times, thus there will be practical limits on the size of the ELU 
for the performance of faithful quantum gates. In particular, 
very large chains may require periodic "refrigerator" ions per- 
haps of a different isotope or species that can quench heating 
ESI . We estimate that ELUs ranging from N q = 10 - 100 
should be possible. More than one ELU chain can be inte- 
grated into a single chip by employing ion shuttling through 
more complex ion trap structures ATI . Such extended ELUs 
(EELUs) consisting of Ne ELU chains can contain a total of 
N q Ne = 20 — 1, 000 physical qubits. For simplicity, we fo- 




Figure 2: Elementary Logic Unit (ELU) composed of a single crys- 
tal of N q trapped atomic ion qubits coupled through their collective 
motion, (a) Classical laser fields impart qubit state-dependent forces 
on one or more ions, affecting entangling quantum gates between the 
memory qubits. (b) One or more of the ions (rightmost in the fig- 
ure) are coupled to a photonic interface, where a classical laser pulse 
maps the state of these communication qubits onto the state of a sin- 
gle photon (e.g., polarization or frequency), which then propagates 
along an optical fiber to be interfaced with other ELUs. 



cus the remainder of the article on systems with one ELU per 
chip (N E = 1). 



Probabilistic Linking of ELUs 

A pair of qubit registers (ELUs or EELUs) can be entan- 
gled with each other using propagating photons emitted by 
ions from each qubit register, designated to be "communi- 
cation qubits." In this scheme, the communication qubit is 
driven to an excited state with fast laser pulses whose dura- 
tion r e <C I/7, so that at most one photon emerges from each 
qubit following appropriate radiative selection rules. When 
photon(s) from two separate communication qubits are mode- 
matched and interfere on a 50/50 beamsplitter, detectors on 
the output modes of the beamsplitter can herald the creation 
of entanglement between the memory qubits |29ti32l . 

We consider two types of photonic connections, charac- 
terized by the number of total photons used in the entan- 
glement protocol between two separate memory qubits (6). 
For type I connections (shown in Fig. [3^), each memory 
with an index i (or j) is weakly excited with probability 
p e <C 1 and the state of the memory+photonic qubit is writ- 
ten |0) • + e ikXi yjp~ e |t); where \n) • denotes the state 
of n photons radiating from the memory into an optical mode 
i, xi is the path length from the emitter 2 to a beamsplitter, 
and k the optical wavenumber |29l . When two memories i 
and j are excited in this way and the photons interfere at the 
beam splitter, the detection of a single photon in either de- 
tector placed at the two output ports of the beamsplitter her- 
alds the creation of the state e ikx > \\) ■ ± e ikx% ||) . 
with probability p = p e Fr]D, where F is the fractional solid 
angle of emission collected, t]d is the detector efficiency in- 
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Figure 3: (a) Type I interference from photons emitted from two 
quantum memories. Each memory is weakly excited so that single 
photon emission has a very small probability yet is correlated with 
the final qubit state. The output photonic channels are mode-matched 
with a 50/50 beamsplitter and subsequent detection of a photon from 
either output port heralds the entanglement of the qubit memories. 
The probability of two photons present in the system is much smaller 
than that of detecting a single photon, (b) Type II interference in- 
volves the emission of one photon from each memory, where the 
internal state of the photon (e.g. its color) is correlated with the qubit 
state. After two photon interference at the beamsplitter, coincidence 
detection of photons at the two detectors heralds the entanglement of 
the qubit memories. 



eluding any losses between emitter and detector, and the sign 
in this state is determined by which one of the two detectors 
fires. Following the heralding of a single photon, the (small) 
probability of errors from double excitation and detector dark 
counts are given respectively by p 2 e and i^dark/7 where i^dark is 
the rate of detector dark counts. For type I connections to be 
useful, the relative optical path length xi — Xj must be stable 
to much better than the optical wavelength 1/k. 

For type II connections (shown in Fig. [3J3), each mem- 
ory is excited with near unit probability p e ~ 1 and and 
the single photon carries its qubit through two distinguish- 
able internal photonic states (e.g., polarization or optical fre- 
quency). For example, the state of the system containing both 
memory and photonic qubits is written as e lk ± Xi \ v±) . + 
e zk t xi ||^_ \ v^) v where \v^ r ) i and denote the frequency 
qubit states of a single-photon emitted by the z-th memory 
with respective wavenumbers k± and associated with op- 
tical frequencies z/f and u^ 9 with — v±\ = ujq ^> 7 
so that these two frequencies are distinguishable. When 
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two memories i and j are excited in this way, we herald 
the connection of the memories by the joint (coincidence) 
detection of photons at output detectors, creating the state 
e i(k lXi +k tXj ) |^ _ jfaxi+kw) ^ with probabil- 
ity p=(p e Fri D ) 2 /2mM. 

The success probability of the 2-photon type II connection 
may be lower than that of the type I connection for small light 
collection fractions, but type II connections are much less sen- 
sitive to optical path length fluctuations, with the relative path 
length Xi — Xj required to be stable only at the level of the 
wavelength associated with the difference frequency 2ttc/ujo 
of the photonic frequency qubit which is typically at the cen- 
timeter scale for hyperfine-encoded memory qubits. In either 
case, the mean connection time is given by te = 1/ (Rp) 
where R is the repetition rate of the initialization/excitation 
process. For atomic transitions, R ~ 0.1 (7/271"), and for 
typical free- space light collection (F ~ 10 -3 ) and taking 
t]d ~ 0.2, we find for a type I connection te ~ 4 msec and 
for a type II connection te ~ 4 sec where we have assumed 
7/27T = 20 MHz. Type II connections eventually outperform 
that of type I with more efficient light collection, which can 
be accomplished by integrating optical elements with the ion 
trap structure without any fundamental loss in fidelity (33). 

In practice, steps must be taken to isolate the communi- 
cation qubit from the memory qubits so that scattered light 
from the excitation laser does not disturb the spectator mem- 
ories and also the emitted photons do not themselves affect 
the memories. It may be necessary to physically separate or 
shuttle the communication qubit away from the others, invok- 
ing some of the techniques from the QCCD approach. How- 
ever, this crosstalk can also be eliminated by utilizing a dif- 
ferent atomic species for the communication qubit (34), so 
that the excitation and emitted light is sufficiently far from 
the memory qubit optical resonance to not cause decoherence. 
The communication qubits need not have excellent quantum 
memory characteristics, because once the entanglement is es- 
tablished between the photonic qubits in different ELUs, they 
can immediately be swapped with neighboring memory qubits 
in each chain. 

The MUSIQC architecture allows a large number Nelu 
of ELUs (or EELUs) to be connected with each other using 
such photonic channels, as shown in Fig. [T] The connec- 
tion is made through an optical crossconnect (OXC) switch 
with Nelu input and output ports. The photon emitted from 
the communication qubit in each ELU is collected into a 
single-mode fiber and directed to a corresponding input port 
of the OXC switch. Up to Nelu /2 Bell state detectors, each 
comprised of two fibers interfering on a beam splitter and 
two detectors, are connected to the output ports of the OXC 
switch. The OXC switch is capable of providing an opti- 
cal path between any input fiber to any output fiber that is 
not already connected to another input fiber. An ideal OXC 
switch achieves full non-blocking connectivity with uniform 
optical path lengths. This optical network provides fully re- 
configurable interconnect network for the photonic qubits, al- 
lowing entanglement generation between any pair of ELUs in 



the processor with up to Nelu I ^ such operations running in 
parallel. OXC switches that support 200 — 1, 100 ports uti- 
lizing micro-electromechanical systems (MEMS) technology 
have been constructed and are readily available (35j|36). In 
practice, the photon detection can be accomplished in paral- 
lel with a conventional charge-coupled-device (CCD) imager, 
with pairs of regions on the CCD associated with particular 
pairs of output ports from the fiber beamsplitters, as shown in 

Fig.0 

Quantum Computing in a Modular Architecture 

In the circuit model of quantum computation, execution 
of two-qubit gates creates the entanglement necessary to ex- 
ploit the power of quantum physics in computation (23) . In 
the alternate model of measurement-based cluster- state quan- 
tum computation, all of the entanglement is generated at the 
beginning of the computation, followed by conditional mea- 
surements of the qubits [ 37) . The MUSIQC architecture pre- 
sented here follows the circuit model of computation within 
each ELU, but the probabilistic connection between ELUs is 
carried out by generation of entangled Bell pairs similar to the 
cluster-state computation model. In this sense, MUSIQC re- 
alizes a hybrid model of quantum computation, driven by the 
generation rate and burn rate of entanglement. In the event 
the generation rate of entangled Bell pairs between ELUs is 
lower than the burn rate, each ELU would require the capac- 
ity to store enough initial entanglement that the rate at which 
the entanglement is burned and produced is sufficient to reach 
the end of the computation. The hybrid nature of MUSIQC 
provides a unique hardware platform with three distinct ad- 
vantages: naturally parallel operation of each ELU, constant 
timescale to perform operations between distant qubits, and 
moderate ELU size adequate for practical implementation. 
One can further reduce the entanglement generation time by 
time-division multiplexing (TDM) the communication ports 
at the expense of added qubits. Moreover, the temporal mis- 
match between the remote entanglement generation and lo- 
cal gates is reduced as the requirement of error correction in- 
creases the logical gate time. 

For complex quantum algorithm involving n bits, logical 
operations between spatially distant qubit pairs are necessary. 
In a hardware architecture where only local gate operations 
are allowed {e.g., nearest neighbor gates), performing gate op- 
erations between two (logical) qubits separated by long dis- 
tances could lead to communication times polynomial in the 
distance between qubits, 0(n h ). When a large number of par- 
allel operations is available, one can employ a nested entan- 
glement swapping protocol to efficiently distribute entangle- 
ment with communication times scaling only logarithmically 
as a function of communication distance. The procedure re- 
quires extra qubits used to construct quantum buses for long- 
distance entanglement distribution, and was referred to as the 
Quantum Logic Array (QLA) (38) . Despite the slow entan- 
glement generation times, the performance of MUSIQC archi- 
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Figure 4: Performance comparisons of quantum ripple-carry adder 
(QRCA) on a nearest-neighbor architecture, and quantum carry- 
lookahead adder (QCLA) on QLA and MUSIQC architectures. 

tecture is comparable with QLA (and its variations 1 39 ]), with 
substantial advantage in required resources and feasibility for 
implementation. Figure]?] shows the comparison of execution 
times and resource requirements for executing a n-bit adder 
with one level of quantum error correction in various archi- 
tectures (the assumptions and outline of performance estima- 
tion is given in Appendix III). When only local interactions 
are available without dedicated buses for entanglement distri- 
bution, a quantum ripple-carry adder is the adequate adder of 
choice l40lL for which the execution time goes as 0(n). For 
QLA and MUSIQC architectures, one can implement quan- 
tum carry-lookahead adder that is capable of completing the 
addition in logarithmic times ETTl . Since quantum adder cir- 
cuits form the basis of modular exponentiation circuit that 
dominates the execution time of Shor algorithm, the speed ad- 
vantage in adder circuits translate directly to faster execution 
of Shor algorithm. 

In the example provided, the MUSIQC architecture takes 
50% longer to carry out the quantum adder circuit compared 
to the QLA architecture due to slow entanglement generation 
times, but uses only about 15% of physical resources (in to- 
tal number of physical qubits and parallel operations neces- 
sary). This is because of the overhead qubits necessary to 
construct the quantum buses. Furthermore, the total size of 
the single ELU necessary to implement the QLA architecture 
grows very quickly (over 10 6 physical qubits for a 1024-bit 
adder), while the ELU size in MUSIQC architecture is fixed at 
moderate numbers (« 1, 500 ELUs with about 100 qubits per 
ELU). Therefore, MUSIQC architecture substantially lowers 
the practical technological barrier in integration levels neces- 
sary for a large-scale quantum computer. 



Fault Tolerance of Probabilistic Photonic Gates 

Naively, it would appear that the average entanglement cre- 
ation time te must be much smaller than the decoherence time 
scale td for fault tolerance, but as shown in the Appendix 
II, scalable fault-tolerant quantum computation is possible for 
any ratio t# /t£> , even in the presence of additional gate er- 
rors. While large values of te/td would lead to impracti- 
cal levels of overhead in qubits and time (similar to the case 
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Figure 5: Three steps of creating a three-dimensional cluster state in 
the MUSIQC architecture, for fast entangling gates. Step 1: Creation 
of Bell pairs between different ELUs, all in parallel. Step 2: CNOT 
gates (head of arrow: target qubit, tail of arrow: control qubit). Step 
3: Measuring of 3 out of 4 qubits per ELU. If the ELU represents a 
face (edge) qubit in the underlying lattice, the measurements are in 
the a z - (a x -) basis. The resulting state is a 3D cluster state, up to 
Hadamard gates on the edge-qubits. 



of conventional quantum fault tolerance near threshold error 
levels l42l ). this result is still remarkable and indicates that 
fault tolerance is always possible in this architecture. Here 
we mainly consider the case where te/td <C 1, where fault 
tolerant coding is more practical. 

When each ELU is large enough to accommodate logical 
qubits encoded with a conventional error correcting code, one 
can implement full fault-tolerant procedure within an ELU as 
in the example presented in the previous section. When the 
ELUs are too small to fit the logical qubits, fault-tolerance can 
be achieved by mapping to three-dimensional cluster states, a 
known approach for supporting fault-tolerant universal quan- 
tum computation l43l . This type of encoding is well-matched 
to the MUSIQC architecture, because the small degree of their 
interaction graph leads to small ELUs. 

For te <C T£>, the 3D cluster state with qubits on the faces 
and edges of a three-dimensional lattice can be created using 
the procedure displayed in Fig. [5] and described in more de- 
tail in the Appendix II. The procedure consists of three basic 
steps, as shown in Fig.|5j Creation of Bell states between dif- 
ferent ELUs via the photonic link; CNOT-gates within Each 
ELU and local measurement of 3 out of 4 qubits in each ELU. 
As can be easily shown using standard stabilizer arguments, 
the resulting state is a 3D cluster state, up to local Hadamard 
gates on the edge qubits. A refined scheduling of operations, 
where no qubit is ever acted on by more than one (even com- 
muting) gate at a time and qubits are never idle, is described 
in the Appendix. 

Fault-tolerance treshold. We assume the following error 
model. (1) Every gate operation, i.e. preparation and measure- 
ment of individual qubits, gates within an ELU, and Bell pair 
creation between different ELUs, can all be achieved within 
a clock cycle of duration T. An erroneous one-qubit (two- 
qubit) gate is modeled by the perfect gate followed by a par- 
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tially depolarizing one-qubit (two qubit) channel. In the one- 
qubit channel, X,Y, and Z errors each occur with probability 
e/3. In the two-qubit channel, each of the 15 possible errors 
Xi,X 2 ,XiX 2 , .. ,ZiZ 2 occurs with a probability of e/15. 
All gates have the same error e. (2) In addition, the effect of 
decoherence per time step T is described by local probabilis- 
tic Pauli errors X, Y, Z, each happening with a probability 
T/3t£>. Under these assumptions, the known error threshold 
for fault-tolerant quantum computation with 3D-cluster states 
translates into the condition 

55 T 

e+-— <2.9x 10- 3 . (1) 
62 t d 

as derived in Appendix I. 

Overhead. The operational cost of creating a 3D cluster 
state and then locally measuring it for the purpose of compu- 
tation is 24 gates per elementary cell in the standard setting, 
and 54 gates per elementary cell in MUSIQC. The overhead 
of the MUSIQC architecture over fault-tolerant cluster state 
computation is thus constant. The operational overhead for 
fault-tolerance in the latter is poly-logarithmic f43). For a dis- 
cussion of operational cost in absolute terms, see l44ll . 

The above construction fails for te/td > 1, where 
decoherence occurs while waiting for Bell-pair entangle- 
ment. However, scalable fault-tolerant computing can still be 
achieved in the MUSIQC architecture for any ratio t#/t£>, 
even for ELUs of only 3 qubits. This result is based on an al- 
ternative construction described in Appendix II. It and makes 
use of a defining feature of the MUSIQC architecture, that 
the average time to create entanglement is independent of dis- 
tance. Compared to the case of te <C td, the operational 
cost of fault-tolerance is increased by a factor that depends 
strongly on te / td but is independent of the size of the com- 
putation. Thus, while quantum computation becomes more 
costly when te > td, it remains scalable. This suprising re- 
sult shows that there is no hard threshold for the ratio t#/t£>, 
and opens up the possibility for efficient fault tolerance con- 
structions with slow entangling gates. 

Outlook 

The success of silicon-based information processors in the 
past five decades hinged upon the scalability of integrated cir- 
cuits (IC) technology characterized by Moore's law l45ll . IC 
technology integrated all the components necessary to con- 
struct a functional circuit, using the same conceptual approach 
over many orders of magnitude in integration levels. The hier- 
archical modular ion trap quantum computer architecture dis- 
cussed here promises scalability, not only in the number of 
physical systems (trapped ions) that represent the qubits, but 
also in the entire control structure to manipulate each qubit at 
such integration levels. 

The technology necessary to realize each and every com- 
ponent of the MUSIQC architecture is either already available 



or within reach. The recognition that ion traps can be mapped 
onto a two dimensional surface that can be fabricated using 
standard silicon microfabrication technologies (T3J [TSJ has 
led to a rapid development in complex surface trap technol- 
ogy 021 El- Present-day trap development exploits extensive 
electromagnetic simulation codes to design optimized trap 
structures and control voltages, allowing sufficient control and 
stability of ion positioning. Integration of optical components 
into such microfabricated traps will enable stronger interac- 
tion between the ions and photons for better photon collection 
and qubit detection l46ll through the use of high numerical 
aperture optics or integration of an optical cavity with the ion 
trap [33]. Moreover, electro-optic and MEMS-based beam 
steering systems allows the addressing of individual atoms in 
a chain with tightly focused laser beams (47] 08) and an opti- 
cal interconnect network can be constructed using large-scale 
all-optical crossconnect switches j35ll . While technical chal- 
lenges such as the operation of narrowband (typically ultravi- 
olet) lasers or the presence of residual heating of ion motion 
ifTTTl still remain, they do not appear to be fundamental road- 
blocks to scalability. Within the MUSIQC architecture we 
have access to a full suite of technologies to realize the ELU 
in a scalable manner, where the detailed parameters of the ar- 
chitecture such as the number of ions per ELU, the number of 
ELUs, or the number of photonic interfaces per ELU can be 
adapted to optimize performance of the quantum computer. 
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APPENDIX I: ANALYSIS OF FAULT-TOLERANCE FOR 
FAST ENTANGLING GATES 

Scheduling. The operations can be scheduled such that (a) 
qubits are never idle, and (b) no qubit is acted upon by mul- 
tiple gates (even commuting ones) at the same time. The lat- 
ter is required in some proposals for realizing quantum gates 
with ion qubits. To this end, the schedule l43l for 3D clus- 
ter state generation is adapted to the MUSIQC architecture, 
and the three- step sequence shown in Fig. [5] of the main text 
is expanded into the five-step sequence shown here in Fig. [6] 
Through steps 1 - 3 the Bell pairs accross the ELUs are cre- 
ated. Through steps 2 - 4 the CNOTs within each ELU are 
performed, and through steps 3-5 three qubits in each ELU 
are measured. The sequence of operations is such that each of 
the three ancilla qubits in every ELU lives for only three time 
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steps: initialization (to half of a Bell pair), CNOT, measure- 
ment. No qubit is ever idle in this protocol. 

What remains to complete the computation is the local mea- 
surement of the 3D cluster state l43l . All remaining measure- 
ments are performed in Step 5 of the above procedure. This 
works trivially for cluster qubits intended for topological error 
correction or the implementation of topologically protected 
encoded Clifford gates (44), since these measurements require 
no adjustment of the measurement basis. To avoid delay in the 
measurement of qubits for the implementation of non-Clifford 
gates, it is necessary to break the 3D cluster states into over- 
lapping slabs of bounded thickness 
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Step 1 



Step 2 Step 3 






CNOT (front) : Step 2 
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Figure 6: Schedule for the creation of a 3D cluster state in the 
MUSIQC architecture. Upper line: Schedule for Bell pair produc- 
tion between ELUs representing face and edge qubits. Lower line: 
Schedule for the CNOT gates within the ELUs corresponding to the 
front faces of the lattice cell. Schedules for the ELUs on other faces 
and on edges are similar. 



Threshold analysis. A criterion for the error threshold of 
measurement-based quantum computation with cluster states 
that has been established numerically for a variety of error 
models is 



(Kd q ) ( {error parameters } ) = . 70 . 



(2) 



Therein, Ko q is a cluster state stabilizer operator associated 
with the boundary of a single volume q, consisting of six 
faces. Be / a face of the three-dimensional cluster, and Kf = 

v^Qeedf^' See Fi S- §*• Then ' k &q = Ufedq K f = 
®fedq a x^ • Furthermore, for the above criterion to apply, 
all errors — for preparation of local states, local and entangling 
unitaries, and measurement — are propagated forward or back- 
ward in time, to solely affect the 3D cluster state. 

The above criterion applies for a phenomenological error- 
model with local memory error and measurement error (the 
threshold error probability per memory step and measurement 
is 2.9% (49]|), f° r a gate-based error model (the threshold error 
probability per gate is 0.67% l43ll ). and further error models 
with only low-order correlated error. Specifically, the criterion 
^ has numerically been tested for cluster state creation pro- 
cedures with varying relative strength of local vs 2-local gate 



error [43], with excellent agreement. In all cases, the error- 
correction was performed using Edmonds' perfect matching 
algorithm. 

The latter case covers the present situation. We have lo- 
cal errors with strength T/td and e, and 2-local errors with 
strength e. The expectation value of the stabilizer operator 
Kqq in Eq. ^ is 



(K dq ) 



n i-2 P _(£). 



(3) 



error sources 



Therein, p_ (E) is the total probability of those Pauli errors in 
the error source E which, after (forward) propagation to the 
endpoint of the cluster state creation procedure, anti-commute 
with the stabilizer operator Kq q . The r.h.s. of Eq. is sim- 
ply a product because the statistical independence of the in- 
dividual error sources. Since the cluster state creation pro- 
cedure is of bounded temporal depth and built of local and 
nearest-neighbor gates only, errors can only propagate a finite 
distance. Therefore, only a finite number of error sources con- 
tribute in Eq. To linear order in e and T/td, the result is 



( Ksq ) = l-^-e-m—. 

5 Td 



(4) 



In combination with Criterion ([2]), this yields the threshold 
condition Eq. ([T]) from the main text. 

Details of counting the error sources. Here we derive 
Eq. ([?]). To simplify the bookkeeping, we make the follow- 
ing observations, (a) A Bell state preparation, 2 CNOT gates 
(one on either side), and two local measurements on the qubits 
of the former Bell pair (one in the Z- and one in the X ba- 
sis) amount to a CNOT gate between remaining participating 
qubits. Therein, the qubit on the edge of the underlying lattice 
is the target, the qubit on the face is the control. We call this 
teleported CNOT a link, (b) Errors can only propagate once 
from face qubit to an edge qubit or vice versa, but never farther 
than that. To see this, consider e.g. a face qubit. There, an X- 
or Y -error can get propagated (face = control of CNOTs). In 
either case it causes an X -error on a neighboring edge qubit. 
But X-errors are not propagated from edge-qubits (edge = tar- 
get of all CNOTs). (c) The stabilizer Kq q has only support on 
face qubits, and is not affected by X -errors. 

Based on these observations, we subdivide the error sources 
affecting (Kd q ) into three categories, namely Type 1: First 
Bell pair created on each face (according to the 5 -step sched- 
ule); Type 2: The CNOT links, consuming the remaining 
Bell pairs; and Type 3: The final measurements of the clus- 
ter qubits (1 per ELU). 

Type-2 contributions: For every CNOT link we only need 
to count Z-errors (and Y = Z) on both the control (= face) 
and target (= edge), because on the face qubit the Z-errors are 
the ones that matter [with (c)], and on the edge qubit, such 
errors may still propagate to a neighboring face qubit [with 
(b)] and matter there. With these simplifications, the effective 
error of each CNOT link between two neighboring ELUs is 
described by the probabilities pzi for a Z-error on the face 
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qubit, piz for a Z-error on the edge qubit, and pzz for the 
combined error; and 



10 T 

pzi = 2e+ , piz 

3 r D 



4 2 T 
Pzz = T-e+- — . (5) 
15 3 T£> 



Herein, we have only kept contributions up to linear order in 
e, T '/td. The contributions to the error come from 1. the Bell 
pair, 2. a first round of memory error on all qubits, 3. the 
CNOT gates, 4. a second round of memory error on all qubits, 
and 5. the two local measurements per link. 

Now we need to discuss the effect of each of the above gates 
on (Kd q ), taking into account propagation effects. For exam- 
ple, consider the link established between the face qubit of a 
front face / with its left neighboring edge qubit. (The Bell 
pair for this link is created in Step 1 , the required CNOTs are 
performed in Step 2, and the local measurements in Step 3.) 
The Z-error on / does not propagate further. The Z-error 
on e is propagated in later steps to a neighboring face, c.f. 
Fig. [6] Thus, the errors Zf and Z e of this gate affect (Kd q ), 
and Z e Zf doesn't. With Eq. ([3]), the gate in question reduces 
(K dq ) by a factor of 1 - 68/15 e - 8T/r D . 

The following links contribute: three for every face in dq 
from within the cell, and three more per face of dq from the 
neighboring cells (links ending in an edge belonging to the 
cell q can affect (Kd q ) by propagation), (i) Contributions 
from within the cell. If a Z e -error of the link propagates to 
an even (odd) number of neighboring faces in q, the total er- 
ror probability affecting (Kd q ) is pzz + Pzi (Piz+Pzi)- But 
since piz = Pzz, all 18 contribution from within the cell q 
are the same, irrespective of propagation, (ii) Contributions 
from neighboring cells. Each of the 18 links in question con- 
tributes an effective error probability piz +Pzz if an error on 
the edge qubit of the link propagates to an odd number of face 
qubits in dq. By inspection of Fig. [6] this happens for 6 links. 
With Eq. ([5]), all the type-2 errors reduce (Kd q ) by a factor of 

1 - 160 — - 88e. (6) 

Type-1 contributions: Each of the initial Bell pair creations 
carries a two-qubit gate error of strength e, and memory error 
of strength T/td on either qubit. Similar to the above case, 
we can group the 15 possible Pauli errors into the equivalence 
classes /, Zf (Z e Zf = / and Z e = Zf for Bell states). The 
single remaining error probability, for Zf, is 



Pzi 



15 



4 T 
3 r D ' 



(J) 



For each face of dq, there is one Bell pair within the face 
that reduces (Kq q ) by a factor of 1 — 2pzi- Bell pairs from 
neighboring cells do not contribute an error here. Thus, all the 
type-1 errors reduce (Kg q ) by a factor of 



1-8- 



T 16 

td 5 



(8) 



Type-3 contributions: The only remaining error source is 
in the measurement of the one qubit per ELU which is part 
of the 3D cluster state. The strength of the effective error on 
each face qubit is pz = 2/3 e. Each of the six faces in dq is 
affected by this error. Thus, all the type-3 errors reduce (Ko q ) 
by a factor of 



1 -8e. 



(9) 



Again, only the contributions to linear order in e, T/td were 
kept. 



Combining the contributions Eq. ([6]), ([8]), ^ of error types 1 
- 3 yields Eq. for the expectation value (Ko q ). 



APPENDIX II: ANALYSIS OF FAULT-TOLERANCE FOR 
SLOW ENTANGLING GATES 

Here we show that scalable quantum computation can be 
achieved for arbitrarily slow entangling gates. There is no 
threshold for the ratio te/td that needs to be reached. 

To this end, the main idea is to construct a "hypercell" out 
of several ELUs. A hypercell has the same storage capacity 
for quantum information as a single ELU, but with the abil- 
ity to become (close to) deterministically entangled with four 
other hypercells. Fault- tolerant universal quantum computa- 
tion can then be achieved by mapping to a 4-valent, three- 
dimensional cluster state (43). First, we show that arbitrarily 
large ratios te / td can be tolerated in the limiting case where 
the gate error rate e = (construction I). Then, we show how 
to tolerate arbitrarily large ratios te / td with finite gate errors 
e > (construction II). 

Hypercell construction I is based on the snowflake design 
[50 1 ,[5 1 1 ; see Fig. [7^. The difference is that in the present 
case, each node in the connectivity tree represents an entire 
ELU, not a single qubit as in (50l [511 . At the root of the 
tree is an ELU that contains the qubit used in the computa- 
tion, while multiple layers of bifurcating branches lead to a 
large "surface area" with many ports from which to attempt 
entanglement generation between two trees. If two neighbor- 
ing surface areas are large enough, the probability of creating 
a Bell pair between them via a probabilistic photonic link ap- 
proaches unity. Once a Bell pair is created, it can be converted 
to a Bell pair between the root qubits A and B via teleporta- 
tion; see Fig.JTJ). 

The links (each representing a Bell pair) within a snowflake 
structure are created probabilistically, each with a probability 
p of heralded success. The success probability for the entire 
structure is thus very small, but it is constant in the size of the 
computation. Correspondingly, the operational cost of creat- 
ing a hypercell is large; but it is independent of the size of the 
computation. The hypercell offers a qubit which can be near- 
deterministically entangled with a constant number of other 
qubits on demand. A quantum computer made up of such 
hypercells can create a four-valent, 3D cluster state with few 
missing qubits, and is thus fault-tolerant l43ll l52lL l6TTl . Hy- 
percells can readily be implemented in the modular ion trap 
quantum computer since the probability of entanglement gen- 
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Figure 7: Hypercell construction I. (a) Snowflake design of I501I5T1 . 
(b) Two hypercells connecting. If the surface area is large, with high 
probability one or more Bell pairs are created between the surface 
areas via the photonic link. By Bell measurements within individual 
ELUs (indicated by ovals) one such Bell pair is teleported to the roots 
A, B. c) Boundary of the fault- tolerance region for gate error e and 
ratio te/td, for various ELU sizes. The threshold for the gate error 
e depends only weakly on te/td. 



eration does not depend on the physical distance between the 
ELUs. 

We call the part of the hypercell needed to connect to a 
neighboring hypercell a "tree". For ELUs of coordination 
number 3, the number m of ports that are available to connect 
two hypercells is twice the number of ELUs in the top layer 
of the tree. The probability for all m attempts to generate en- 
tanglement between two trees to fail is Pf a n = (1 — p) m ~ 
exp(— rap). (In practice, we will allow a constant probability 
of failure which is tolerable in 3D cluster states l52l .) In ad- 
dition, the number of ELUs in the top layer is 2^ layers , and 
the path length I (number of Bell pairs between the roots) 
is I = 21og 2 m + 1. Combining the above, we find that 



I = 21og 2 



1, for c 



lnPf a ii. For simplification we 



assume that the time t for attempting entanglement generation 
is the same when creating the trees and when connecting the 
trees. Then, p = t/rs in both cases. From the beginning of 
the creation of the trees to completion of entangling two trees, 
a time 2t has passed. The Bell pairs within the trees have 
been around, on average, for a time 3t/2, and the Bell pairs 



between the two trees for an average time of t/2. If overall 
error probabilities remain small, the total probability of error 
for creating a Bell pair is proportional to I. The memory error 
alone is 



t 



31og 2 (c T f) 



(10) 



This function is monotonically increasing with t, and 
e mem (t = 0) = 0. The task now is to suppress the mem- 
ory error rate e mem below the error threshold e cr i t that applies 
to fault-tolerant quantum computation with 3D cluster states. 
From Eq. ^ we know that e cr i t > 0. 

From Eq. ( [To} we find that, for any ratio te/te, we can 
make t small enough such that e mem < e C rit- The operational 
cost for creating a hypercell with sufficiently many ports is 

9/2 c 

O (hypercell) ~ P • This cost is high for small p = 

t/rE, but independent of the size of the computation. Thus, 
whenever decoherence on waiting qubits is the only source of 
error, scalable fault-tolerant QC is possible for arbitrarily slow 
entangling gates. 

We now discuss how the above hypercell construction I 
fares in the presence of additional gate error e. We model ev- 
ery noisy one-(two-)qubit operation by the perfect operation 
followed by a SU (2)- (SU (4)-) invariant partial depolarizing 
channel with strength e. Specifically, in the one-qubit channel, 
Pauli errors cr x ,a y , a z each occur independently with a prob- 
ability e/3. For the two-qubit channel, each of the 15 possible 



Pauli errors cr x \ a 



(1) ^(2) 



a ( V 2) 



(1) (2) vi 

crz (Jz occurs with a 



probability of e/15. 

If e > then every entanglement swap adds error to the 
computation. We must swap entanglement in every ELU on 
the path between the roots A and B, and because there are 
2 log 2 m of them (m > 2), for e <C 1 the total error is 



e total 



t 



31og 2 (c^) + ij +2elog 2 (c^). (11) 



Now it is no longer true that for any choice of te / td we can 
realize e cr i t > e to tai- A non- vanishing gate error sets an upper 
limit to the tree depth, because the accumulated gate error is 
proportional to the tree depth; see Fig.|7j3. This implies an up- 
per bound on the size of the top layer of the tree. This implies 
a lower bound on the time t needed to attempt entangling the 
two trees, cf. Eq. ( [T2| ). This implies a lower bound on the 
memory error caused by decoherence during the time inter- 
val t. The accumulated memory error alone may be above or 
below the error threshold, depending on the ratio te/td- 

In more detail, suppose that e cr i t > e tota i holds. Considering 
only gate errors, e cr i t > 2elog 2 (c 1 ^ ), and hence, 



t > CT E 2 2. 



(12) 



Now, recalling that c 1 ^ = m > 2, with Eq. ( 11 ) we find that 



e crit > 3t/r D + 2e, or 



2e)T D . 



(13) 
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Figure 8: Hypercell construction II. (a) Lattice cell of a three- 
dimensional four-valent cluster state. The dashed lines represent the 
edges of the elementary cell and the full lines represent the edges of 
the connectivity graph, (b) Creating probabilistic links between sev- 
eral 3D cluster states, (c) Reduction of a 3D cluster state to a 5-qubit 
graph state, via Pauli measurements. The shaded regions represent 
measurements of Z, the blank regions represent measurements of X. 
The qubits represented as black dots remain unmeasured. For de- 
tails, see l43l . (d) Linking graph states by Bell measurements in the 
remaining ELUs. Four-valent, 3D cluster states of arbitrary size can 
be created. 



The two conditions Eq. ( |T2| ) and ( [13] ) can be simultaneously 
obeyed only if 



T D 3c 



2e, 



-2 2. 



(14) 



We see that there is now an upper bound to the ratio te/td- 



Eq. (14) is a necessary but not sufficient condition for fault- 
tolerant quantum computation using the hypercells of Fig. [7]:. 

We have numerically simulated the process of constructing 
the hypercells of type I, for various values of the decoherence 
parameters e and te/td- The boundary of the fault-tolerance 
region in the t#/t£>, e-plane is shown in Fig. [7]:. In the above, 
for simplicity, we have considered hypercells in which all con- 
stituent ELUs are entangled in a single timestep t. However, 
there are various possible refinements. (1) The computational 
overhead can be significantly decreased by creating the hy- 
percell in stages, starting with the leaves of the trees and it- 
eratively combining them to create the next layers l50l . (2) 
Using numerical simulations it was found that if each of the 
4 trees making up a hypercell has coordination number 4 or 5 
rather then 3 (i.e., a ternary tree instead of a binary tree), the 
overhead can be further reduced. These optimizations were 
used to produce Figure [7]:. 

Hypercell construction II allows fault-tolerance for finite 
gate errors e > 0. In construction I, the accumulated error for 



creating a Bell pair between the roots A and B is linear in the 
path length I between A and B. This limits the path length I, 
and thereby the surface area of the hypercell. This limitation 
can be overcome by invoking three-dimensional (3D) cluster 
states already at the level of creating the hypercell. 3D cluster 
states have an intrinsic capability for fault-tolerance l43l re- 
lated to quantum error correction with surface codes l53ll54ll . 
For the hypercell of type II we employ a 3D cluster stated 
nested within another 3D cluster state. Therein, the "outer" 
cluster state is created near-deterministically from the hyper- 
cells. Its purpose is to ensure fault- tolerance of the construc- 
tion. The "inner" 3D cluster state is created probabilistically. 
Its purpose is to provide a means to connect distant qubits 
in such a way that the error of the operation does not grow 
with distance. Specifically, if the local error level is below the 
threshold for error-correction with 3D cluster states, the error 
of (quasi-) deterministically creating a Bell pair between two 
root qubits A and B in distinct 3D cluster states is independent 
of the path length between A and B. 

The construction is as follows. We start from a three- 
dimensional grid with ELUs on the edges and on the faces. 
Each ELU contains four qubits and can be linked to four 
neighboring ELUs. Such a grid of ELUs (of suitable size) 
is used to probabilistically create a 4-valent cluster state by 
probabilistic generation of Bell pairs between the ELUs, post- 
selection and local operations within the ELUs. 

After such cluster states have been successfully created, in 
each ELU three qubits are freed up, and can now be used for 
near-deterministic links between different 3D cluster states. 
See Fig. [HJ). After 4 probabilistic links to other clusters have 
succeeded (the size of the cluster states is chosen such that 
this is a likely event), the cluster state is transformed into 
a star-shaped graph state via X and Z measurements; see 
Fig. This graph state contains 5 qubits, shared between 
the 4 ELUs at which the successful links start, and an addi- 
tional ELU. Due to the topological error-correction capability 
of 3D cluster states, the conversion from the 3D cluster state 
to the star- shaped graph state is fault- tolerant l43l . By further 
measurement in the ELUs, the graph states created in differ- 
ent hypercells can now be linked, e.g. to form again a 4-valent 
3D cluster state which is a resource for fault-tolerant quantum 
computation (43); see Fig.[5]l This final linking step is prone 
to error. However, the error level is independent of the size of 
the hypercell, which was not the case for hypercell construc- 
tion I. 

The only error sources remaining after error-correction in 
the 3D cluster stem from (i) the (two) ports per link, and (ii) 
the two root qubits A and B, which are not protected topolog- 
ically. The total error e tota i of a Bell pair created between A 
and B in this case is given by e to tai = cit/Vo + C2 e, where t 
is the time spent attempting Bell pair generation, and c\ and 
C2 are algebraic constants which do not depend on the time 
scales te and td, and not on the distance between the root 
qubits A and B. Then, if the threshold error rate e cr i t for fault- 
tolerance of the outer 3D cluster state is larger than C2 e, we 
can reach an overall error e to tai below the threshold value e cr i t 
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by making t sufficiently small. Smaller t requires larger inner 
3D cluster states, but does not limit the success probability for 
linking type II hypercells. Thus, fault-tolerance is possible for 
all ratios t#/t£>, even in the presence of small gate errors. 



APPENDIX III: ARCHITECTURE MODELS FOR 
QUANTUM CIRCUITS 

We have constructed simple models for quantum computer 
architectures to estimate the execution time of useful algo- 
rithms. In our simplified model, we consider (1) hardware 
capable of implementing a Steane [[7,1,3]] quantum error cor- 
rection code to one level of concatenation, (2) where all gate 
operations are performed following fault- tolerant procedures. 
This simplified model is designed to estimate the execution 
time of the circuits in such architecture, and not intended to 
provide the complete fault-tolerant analysis of the quantum 
circuit. For this analysis, we therefore require that the physical 
error levels are sufficiently low to produce the correct answer 
with order-unity probability using only one level of concate- 
nation of Steane code. The hardware is based on trapped ion 
quantum computing with the assumptions for the timescales 
for quantum operation primitives summarized in Table [I] 

Universal Fault-Tolerant QC using Steane Code 

We utilize the basic operational primitives of universal 
quantum computation using Steane [[7,1,3]] code [55 ] fully 
outlined in Ref. [23 1, summarized below. 



1. The preparation of logical qubit |0) L is performed by 
measuring the six stabilizers of the code using four- 
qubit cat state \cat) 4 = (|0000) + |llll))/>/2, using 
the procedure that minimizes the use of ancilla qubits 
as outlined in Ref. t56ll . The stabilizer measurement 
is performed up to three times to ensure that the error 
arising from the measurement process itself can be cor- 
rected. We perform a sequential measurement of the six 
stabilizers re-using the four ancilla qubits for each log- 
ical qubit, which reduces the number of physical qubits 
and parallel operations necessary for the state prepa- 
ration at the expense of the execution time. Once all 
the stabilizers are measured, a three-qubit cat state is 
used to measure the logical Zl operator to finalize qubit 
initialization process. This procedure requires eleven 
physical qubits to complete preparation of logical qubit 

|o) L . 

2. In this code, all operations in the Pauli 
group {Xl,Yl, Zl} and the Clifford group 
{Hl, Sl,CNOTl} can be performed transversally 
(i.e., in a bit- wise fashion). We assume seven parallel 
operations are available, so that these logical operations 
can be executed in one time step corresponding to 
the single- or two-qubit operation. The transversal 



CNOTl considered here is between two qubits that 
are close by, so the operation can be performed locally 
without further need for qubit communication. 

3. In order to construct effective arithmetic circuits, we 
need Toffoli gate (a.k.a. CCNOTl) which is not in 
the Clifford group. Since a transversal implementation 
of this gate is not possible in Steane code, fault-tolerant 
implementation requires preparation of a special three 
(logical) qubit state 

\<f>+) L = l(\000) L + |010> L + |100) L + |111) L ), (15) 

and "teleport" the gate into this state (57). This state 
can be prepared by measuring its stabilizer operator us- 
ing a 7-qubit cat state on three logical qubits |0) L , as 
shown in Fig. [9^. Successful preparation of this state 
requires a bitwise Toffoli gate (at the physical level), 
which we assume can only be performed locally among 
qubits that are close to one another. Once this state is 
prepared, the three qubits \x) L , \y) L and \z) L partici- 
pating in the Toffoli gate can be teleported to execute 
the gate, as shown in Fig. [9J3. Therefore, a successful 
Toffoli gate operation requires 3 logical qubits (which 
in turn require extra ancilla qubits to initialize) and 7 
physical qubits as ancillary qubits, on top of the three 
logical qubits on which the gate operates on. 

4. When a CNOT gate is necessary between two qubits 
that are separated by large distances, we take the ap- 
proach where the two qubits of a maximally-entangled 
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Figure 9: Circuit diagram for realizing fault-tolerant Toffoli gate us- 
ing Steane code, (a) The initial state 1 0+ ) L is prepared by measuring 
the Xi and CNOT 12 of three qubit state |0) 1 (|0) 2 + |l) 2 )|0) 3 />/2. 
Note that the Toffoli gate shown here is a bitwise Toffoli between the 
7-qubit cat state and the two logical qubit states, (b) Using the state 
prepared in (a), Toffoli gate can be realized using only measurement, 
Clifford group gates and classical communication, all of which can 
be implemented fault-tolerantly in the Steane code. 
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Table I: Assumptions on the timescales of quantum operation primitives used in the model. 



Quantum 


Single-Qubit 


Two-Qubit 


Toffoli 


Qubit 


Remote Entanglement 


Primitive 


Gate 


Gate 


Gate 


Measurement 


Generation 


Operation Time (/xs) 


1 


10 


10 


100 


10000 



state is each distributed to the vicinity of the two qubits, 
and then the gate is teleported using the protocol pro- 
posed in Ref. l58l . Efficient distribution of the en- 
tangled states makes this approach much more effec- 
tive than where the qubits themselves are transported 
directly. 

Construction of Efficient Arithmetic Circuits 

The example quantum circuit we analyze is an adder cir- 
cuit that computes the sum of two n-bit numbers. Simple 
adder circuits form the basis of more complex arithmetic cir- 
cuits, such as the modular exponentiation circuit at the heart 
of Shor's factoring algorithm |59l . Quantum adder circuits 
can be constructed using X, CNOT and Toffoli gates. Quan- 
tum ripple-carry adders (QRCA) require minimal hardware 
resources but features runtime of O(n), and was optimized 
by Cuccaro et al. |40|. More advanced adder circuits are 
available, that requires additional hardware (qubits and paral- 
lel operations) but dramatically reduces the runtime to 0(log 
n) l60l . Here, we analyze the case of a quantum carry looka- 
head adder (QCLA) summarized by Draper et al. flTTl . which 
dramatically outperforms the ripple-carry adders for n above 
~ 100 in terms of execution time. 

Practical implementation of large-scale QCLAs are hin- 
dered by the requirement of executing Toffoli gates among 
qubits that are separated by long distances within the quan- 
tum computer. MUSIQC architecture flattens the commu- 
nication cost between qubits in different ELUs, providing a 
suitable platform for implementing QCLAs. Alternatively, 
nested entanglement swapping protocol proposed for quantum 
teleportation can be used to efficiently distribute maximally- 
entangled states in a hardware where only local gates are 
available, as long as a dedicated communication bus is pro- 
vided. Such quantum logic array (QLA) architecture can also 
effectively execute QCLAs [38]. 

MUSIQC Implementation 

In order to implement the QCLA circuit in MUSIQC archi- 
tecture, each ELU should be large enough to accommodate 
the generation of the |</>+) L state shown in Fig. [9^. This re- 
quires a minimum of 3 logical qubits and a 7-qubit cat state, 
and sufficient ancilla qubits to support the state preparation. 
We balance the qubit resource requirements with computation 
time by requiring four ancilla qubits per logical qubit, so that 
the 4-qubit cat states necessary for the stabilizer measurement 



can be created in parallel. Implementation of each Toffoli gate 
is realized by allocating a fresh ELU and preparing the |</>+) L 
state, then teleporting the three qubits from other ELUs into 
this state. Once the gate is performed, the original logical 
qubits from the other ELUs are freed up and become available 
for another Toffoli gate. We find that 6n logical qubits placed 
on 6n/4 = 1.5n ELUs is sufficient to compute the sum of two 
n-bit integers using the QCLA circuit. 

Teleportation of qubits into the ELU containing the pre- 
pared |0+) L state requires generation of entangled states via 
photon exchange. In order to minimize the entanglement gen- 
eration time, one should provide at least three optical ports 
to connect to these ELUs in parallel. In order to successfully 
teleport the gate, we need to create seven entangled pairs to 
each ELU holding the input qubits. The entanglement gener- 
ation time can be reduced by running multiple optical ports 
to other ELUs in parallel (we call this the port multiplexity 
m p ). In a typical entanglement generation procedure, the ion 
is prepared in an initial state, and then excited using a short 
pulse laser (~lps). The ion emits a photon over a spontaneous 
emission lifetime (^10ns), and the photon detection process 
will determine whether the entanglement generation from a 
pair of such ions is successful. If the entanglement genera- 
tion is successful, the pair is ready for use in the computation. 
If not, the ions will be re-initialized (~ 1/is) and the process 
is repeated. Since the initialization time of the ion is ~100 
times longer than the time a photon is propagating in the op- 
tical port, one can utilize multiple ions per optical port and 
"pipeline" the photon emission process. In this time-division- 
multiplex (TDM) scheme, another ion is brought into the op- 
tical port to make another entanglement generation attempt 
through the optical port while the initialization process is pro- 
ceeding for the unsuccessful ion. This process can be repeated 
tut times using as many extra ions, before the first ion can be 
brought back (we call mr the TDM multiplexity). Using the 
port and TDM multiplexity, we can reduce the entanglement 
generation time by a factor of m p rriT- 

In our example, we assume multiplexities m p = 2 and 
m T = 10 that require 100 qubits (=3x7 + 3x4 + 3x2x10) 
and 12 parallel operations per ELU as shown in Fig. ??a. This 
choice adequately speeds up the communication time between 
ELUs to balance out other operation times in the hardware. 
Multiple ELUs are connected by an optical switch to com- 
plete the MUSIQC hardware (Fig. [TJ). With these resources, 
an efficient implementation of QCLA circuit can realized by 
executing all necessary logic gates in parallel. Under these 
circumstances, the depth of the n-bit in-place adder circuit is 
given by PHI 
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(a) 




Logic Unit with 42 physical qubits 
In 7x7 square format 

4 logical qubits (28) 

Ancilla qubits (20), 1 spare qubit 

12 parallel operations 




Logic Block with 6 logic units embedded in communication units 

24 logical qubits 

882 communication qubits (7x7x18) 
441 parallel operations 

Figure 10: Example of the QLA hardware considered, (a) Each logic 
unit is made up of 49 physical qubits hosting four logical qubits and 
necessary ancilla qubits. (b) A logic block is six such logic units 
embedded in communication units. Communication units are square 
arrangements of 7 x 7 qubits, and eight such units fully surround the 
logic unit. 



[log 2 nJ + [log 2 (n - 1)J 



ii n i 
L!og2 3 J 



L lo S2 



-14, 
(16) 

for sufficiently large n (n > 6) where [x\ denotes the largest 
integer not greater than x. Out of these, two time steps con- 
tain X gates, four contain CNOT gates, and the rest contain 
Toffoli gates which dominate the execution time of the cir- 
cuit. We assume an error correction step is performed on all 
qubits after each time step, by measuring all stabilizers of the 
Steane code and making necessary corrections based on the 
measurement outcome. 



QLA Implementation 

QLA and its variations utilize dedicated communication 
qubits to connect remote qubits in logarithmic time as a func- 
tion of their separation. Even in the case of qubit arrays where 
only nearest neighbor qubit gates are allowed, this strategy can 
be adopted to effectively implement QCLA adder circuits in 
sub-polynomial time l38l . In this section, we consider a con- 
crete layout of a QLA device optimized for n-bit adder with 
one level of Steane [[7,1,3]] encoding. 

In order to implement the fault-tolerant Toffoli gate de- 
scribed in Fig. [9| one should assemble four logical qubits into 
a single tight unit, as we did for the ELUs in the MUSIQC ar- 



chitecture. In the QLA implementation, a "Logic Unit (LU)" 
consists of a square of 49 (= 7 x 7) qubits, where a block of 12 
(=3x4) qubits form a logical qubit with 7 physical qubits and 
5 ancilla qubits (Fig. [H)|t). Just like in the MUSIQC example, 
6n logical qubits placed on 1.5n LUs are necessary for adding 
two n bit numbers. Therefore, we organize six LUs into a log- 
ical block (LB), capable of adding two 4 bit numbers. Each 
LU in the LB is surrounded by eight blocks of 7 x 7 communi- 
cation units dedicated for distributing entanglement using the 
quantum repeater protocol (Fig. [TOfr). We assume that the 
communication of the qubits within each LU is "free", and do 
not consider the time it takes for such communication. This 
simplified assumption is justified as the communication time 
between LUs utilizing the qubits in the communication units 
dominate the computation time, and therefore does not change 
the qualitative conclusion of this estimate. 

Similar to the MUSIQC hardware example, a Toffoli gate 
execution involves the preparation of the state |</>+) L state in 
an "empty" LU, then teleporting three qubits onto this LU to 
complete the gate operation. The execution time of the Tof- 
foli gate therefore is comprised of the time it takes to prepare 
the |0+) L state, the time it takes to distribute entanglement be- 
tween adequate pairs of LUs, and then utilizing the distributed 
entanglement to teleport the gate operation. Among these, the 
distribution time for the entanglement is a function of the dis- 
tance between the two LUs involved, while the other two are 
independent of the distance. 

QCLA circuit involves various stages of Toffoli gates, 
where the "distance" between qubits goes as 2 t , where 1 < 



t < [log^J [41]. In a 2D layout as considered in Fig. 10 



the linear distance between these two qubits goes as y/2*, in 
units of the number of communication units that the entangle- 
ment must be generated over. A slightly more careful anal- 
ysis shows that the linear distance is approximately given by 
d(t) = 3-2*/ 2 +l when t is even, andd(t) = 2^ +1 )/ 2 +l when 
t is odd. Since each communication unit has 7 qubits along a 
length, the actual teleportation distance is L(t) = 7d(t) in 
units of the length of ion chain. The nested entanglement 
swapping protocol can create entanglement between the two 
end ions in |_log 2 L(t)\ time steps, where each time step con- 
sists of one CNOT gate, two single qubit gates, and one qubit 
measurement process. Using the expression for d(t), we ap- 
proximate log 2 L(t) « t/2 + 4 for both even and odd t, with- 
out loss of much accuracy. Unlike in the case of MUSIQC, 
the entanglement generation time is now dependent on the 
distance between the qubits (although only in a logarithmic 
way), and the resulting time steps needed for entanglement 
distribution within the QCLA is (approximately) given by 



77 77 77 — 1 77 — 1 

Llog 2 nJ(Llog 2 nJ+17)/4+Llog 2 (n-l)J(Llog 2 (n-l)J+17)/4+Llog 2 -J(Llog 2 -j+17)/4+Llog 2 — J(Llog 2 — J+17)/4. 

(17) 
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Table II: Summary of the resource estimation and execution times of 
various adders in MUSIQC and QLA architecture. 



Performance 


QCLA on 


QLLA on 


QKCA on 


Metrics 


MUSIQC 


QLA 


NN 


Physical Qubits 


150n 


l,176n 


20(ra+l) 


# Parallel Operations 


18n 


llOra 


8n + 43 


Logical Toffoli Time 


7,520 


4,357 a 


3,909 


128-bit addition 


0.35 s 


0.24 s 


1.0 s 


1,024-bit addition 


0.47 s 


0.34 s 


8.1 s 


16,384-bit addition 


0.64 s 


0.47 s 


130 s 



a Does not include entanglement distribution time 



parts of a large quantum computer. 

Due to the large number of resources necessary to connect 
different parts of the quantum computer together, the com- 
plexity of the QLA hardware grows very quickly. It is diffi- 
cult to envision how to realize all the qubits and their control 
hardware as the number of ions that have to intimately inter- 
act increases. MUSIQC architecture provides a more techno- 
logically tractable approach to realizing a scalable quantum 
computer, as the computer is broken up into smaller chunks 
(in our example, ~ 100 qubits) more amenable for practical 
realization. 



It should be noted that in order to achieve this logarithmic 
time, one has to have the ability to perform two qubit gate be- 
tween every pair of qubits in the entire communication units 
in parallel. The addition of two n qubit numbers require n/4 
LBs. Since each LB has 18 communication units, there are 
a total of 7 x 7 x 18 = 882 communication qubits in a 
LB. The number of parallel operations necessary is therefore 
441 simultaneous CNOT operations per LB, or 441n/4 w 
11 On parallel operations for n bit QCLA. The number of X, 
CNOT and Toffoli gates that have to be performed remains 
identical to the MUSIQC case since we are executing identical 
circuit. We assume that the error correction is performed af- 
ter every logic gate, but the entanglement distribution process 
has high enough fidelity so that no further distillation process 
is necessary. 



Comparison 

Table [II] summarizes the resource requirements and perfor- 
mance of the QCLA circuit on MUSIQC and QLA architec- 
ture, as well as QRCA circuit on a nearest neighbor (NN) 
quantum hardware, where multi-qubit gates can only operate 
on qubits sitting right next to one another. Although the QLA 
architecture considered in this example is also a NN hardware, 
presence of the dedicated communication units (quantum bus) 
allows remote gate operation with the execution time that de- 
pends only logarithmically on the distance between qubits, 
enabling fast execution of the QCLA. The cost in resources, 
however, is significant: realization of efficient communication 
channel requires ~ 3 times as many physical qubits as used 
for storing and manipulating the qubits, and requires a large 
number of parallel operations and the necessary control hard- 
ware to run them. The execution time is fast compared to the 
MUSIQC architecture, due to the probabilistic nature of the 
photonic network in establishing the entanglement. We have 
dedicated substantial resources in MUSIQC to speed up the 
entanglement generation time. Although MUSIQC architec- 
ture will take ~ 50% more time to execute the adder circuit, 
the resources it requires to operate the same task is only about 
15% of that required in the QLA architecture. In both cases, 
we note the importance of moving qubits between different 
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